54 research outputs found
Recommended from our members
Kafka, Samza and the Unix Philosophy of Distributed Data
Apache Kafka is a scalable message broker, and Apache Samza is a stream processing framework built upon Kafka. They are widely used as infrastructure for implementing personalized online services and real-time predictive analytics. Besides providing high throughput and low latency, Kafka and Samza are designed with operational robustness and long-term maintenance of applications in mind. In this paper we explain the reasoning behind the design of Kafka and Samza, which allow complex applications to be built by composing a small number of simple primitives – replicated logs and stream operators. We draw parallels between the design of Kafka and Samza, batch processing pipelines, database architecture, and the design philosophy of Unix
A Conflict-Free Replicated JSON Datatype
Many applications model their data in a general-purpose storage format such as JSON. This data structure is modified by the application as a result of user input. Such modifications are well understood if performed sequentially on a single copy of the data, but if the data is replicated and modified concurrently on multiple devices, it is unclear what the semantics should be. In this paper we present an algorithm and formal semantics for a JSON data structure that automatically resolves concurrent modifications such that no updates are lost, and such that all replicas converge towards the same state (a conflict-free replicated datatype or CRDT). It supports arbitrarily nested list and map types, which can be modified by insertion, deletion and assignment. The algorithm performs all merging client-side and does not depend on ordering guarantees from the network, making it suitable for deployment on mobile devices with poor network connectivity, in peer-to-peer networks, and in messaging systems with end-to-end encryption.This research was supported by a grant from The Boeing Company
Verifying Strong Eventual Consistency in Distributed Systems
Data replication is used in distributed systems to maintain up-to-date copies of shared data across multiple
computers in a network. However, despite decades of research, algorithms for achieving consistency in
replicated systems are still poorly understood. Indeed, many published algorithms have later been shown to
be incorrect, even some that were accompanied by supposed mechanised proofs of correctness. In this work,
we focus on the correctness of Conflict-free Replicated Data Types (CRDTs), a class of algorithm that provides
strong eventual consistency guarantees for replicated data. We develop a modular and reusable framework
in the Isabelle/HOL interactive proof assistant for verifying the correctness of CRDT algorithms. We avoid
correctness issues that have dogged previous mechanised proofs in this area by including a network model
in our formalisation, and proving that our theorems hold in all possible network behaviours. Our axiomatic
network model is a standard abstraction that accurately reflects the behaviour of real-world computer networks.
Moreover, we identify an abstract convergence theorem, a property of order relations, which provides a formal
definition of strong eventual consistency. We then obtain the first machine-checked correctness theorems for
three concrete CRDTs: the Replicated Growable Array, the Observed-Remove Set, and an Increment-Decrement
Counter. We find that our framework is highly reusable, developing proofs of correctness for the latter two
CRDTs in a few hours and with relatively little CRDT-specific code
A Critique of the CAP Theorem
The CAP Theorem is a frequently cited impossibility result in distributed
systems, especially among NoSQL distributed databases. In this paper we survey
some of the confusion about the meaning of CAP, including inconsistencies and
ambiguities in its definitions, and we highlight some problems in its
formalization. CAP is often interpreted as proof that eventually consistent
databases have better availability properties than strongly consistent
databases; although there is some truth in this, we show that more careful
reasoning is required. These problems cast doubt on the utility of CAP as a
tool for reasoning about trade-offs in practical systems. As alternative to
CAP, we propose a "delay-sensitivity" framework, which analyzes the sensitivity
of operation latency to network delay, and which may help practitioners reason
about the trade-offs between consistency guarantees and tolerance of network
faults
Recommended from our members
Interleaving anomalies in collaborative text editors
Collaborative text editors allow two or more users to concurrently edit a shared document without merge conflicts. Such systems require an algorithm to provide convergence, ensuring all clients that have seen the same set of document edits are in the same state. Unfortunately convergence alone does not guarantee that a collaborative text editor is usable. Several published algorithms for collaborative text editing exhibit an undesirable anomaly in which concurrently inserted portions of text with a well-defined order may be randomly interleaved on a character-by-character basis, resulting in an unreadable jumble of letters. Although this anomaly appears to be known informally by some researchers in the field, we are not aware of any published work that fully explains or addresses it. We show that several algorithms suffer from this problem, explain its cause, and also identify a lesser variant of the anomaly that occurs in another algorithm. Moreover, we propose a specification of collaborative text editing that rules out the anomaly, and show how to prevent the lesser anomaly from occurring in one particular algorithm.The Boeing Company and EPSRC “REMS: Rigorous Engineering for Mainstream Systems” programme grant (EP/K008528)
Ghost trace on the wire? Using key evidence for informed decisions
Modern smartphone messaging apps now use end-to-end encryption to provide authenticity, integrity and confidentiality.
Consequently, the preferred strategy for wiretapping such apps is to insert a ghost user by compromising the platform's public key infrastructure.
The use of warning messages alone is not a good defence against a ghost user attack since users change smartphones, and therefore keys, regularly, leading to a multitude of warning messages which are overwhelmingly false positives.
Consequently, these false positives discourage users from viewing warning messages as evidence of a ghost user attack.
To address this problem, we propose collecting evidence from a variety of sources, including direct communication between smartphones over local networks and CONIKS, to reduce the number of false positives and increase confidence in key validity.
When there is enough confidence to suggest a ghost user attack has taken place, we can then supply the user with evidence to help them make a more informed decision
Recommended from our members
A Highly-Available Move Operation for Replicated Trees
Replicated tree data structures are a fundamental building block of distributed filesystems, such as Google Drive and Dropbox, and collaborative applications with a JSON or XML data model. These systems need to support a move operation that allows a subtree to be moved to a new location within the tree. However, such a move operation is difficult to implement correctly if different replicas can concurrently perform arbitrary move operations, and we demonstrate bugs in Google Drive and Dropbox that arise with concurrent moves. In this paper we present a CRDT algorithm that handles arbitrary concurrent modifications on trees, while ensuring that the tree structure remains valid (in particular, no cycles are introduced), and guaranteeing that all replicas converge towards the same consistent state. Our algorithm requires no synchronous coordination between replicas, making it highly available in the face of network partitions. We formally prove the correctness of our algorithm using the Isabelle/HOL proof assistant, and evaluate the performance of our formally verified implementation in a geo-replicated setting.The Boeing Company; EPSRC “REMS: Rigorous Engineering for Mainstream Systems” programme grant (EP/K008528); Leverhulme Trust Early Career Fellowship, Isaac Newton Trust; Nokia Bell Labs
From Secure Messaging to Secure Collaboration
© 2018, Springer Nature Switzerland AG. We examine the security of collaboration systems, where several users access and contribute to some shared resource, document, or database. To protect such systems against malicious servers, we can build upon existing secure messaging protocols that provide end-to-end security. However, if we want to ensure the consistency of the shared data in the presence of malicious users, we require features that are not available in existing messaging protocols. We investigate the protocol failures that may arise when a new collaborator is added to a group, and discuss approaches for enforcing the integrity of the shared data
Correlation energy of an electron gas in strong magnetic fields at high densities
The high-density electron gas in a strong magnetic field B and at zero
temperature is investigated. The quantum strong-field limit is considered in
which only the lowest Landau level is occupied. It is shown that the
perturbation series of the ground-state energy can be represented in analogy to
the Gell-Mann Brueckner expression of the ground-state energy of the field-free
electron gas. The role of the expansion parameter is taken by r_B= (2/3 \pi^2)
(B/m^2) (\hbar r_s /e)^3 instead of the field-free Gell-Mann Brueckner
parameter r_s. The perturbation series is given exactly up to o(r_B) for the
case of a small filling factor for the lowest Landau level.Comment: 10 pages, Accepted for publication in Phys.Rev.
Magnetic-field-induced Luttinger liquid
It is shown that a strong magnetic field applied to a bulk metal induces a
Luttinger-liquid phase. This phase is characterized by the zero-bias anomaly in
tunneling: the tunneling conductance scales as a power-law of voltage or
temperature. The tunneling exponent increases with the magnetic field as BlnB.
The zero-bias anomaly is most pronounced for tunneling with the field applied
perpendicular to the plane of the tunneling junction.Comment: a reference added, minor typos correcte
- …